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1. Introduction 


The general problem considered in this note is how to locate a known object from 
sensory data, especially when that object may be occluded by other (possibly un¬ 
known) objects. In previous work [Grimson and Lozano-Perez 84, 87] we described a 
recognition system, called RAF (for Recognition and Attitude Finder), that identifies 
and locates objects from noisy, occluded data. In that work, we concentrated on a 
particular subclass of rigid models. If the sensory data provided two-dimensional 
geometric data, for example intensity edges from a visual image, we considered 
the recognition of objects that consisted of sets of linear segments, or equivalently, 
polygonal objects in which some edges are not included. If the sensory data was 
three-dimensional, we considered the recognition of objects that consisted of sets of 
planar fragments, or equivalently, polyhedral objects in which some of the faces are 
not included. 

In general, of course, we cannot guarantee that the recognition system will be 
confronted only with rigid polyhedral objects of known size. The RAF has been 
extended to deal with curved objects, in the two-dimensional case [Grimson 1987]. 
In this note, we consider extensions of our method to deal with families of objects 
that are characterized by sets of free parameters. 


2. Recognition as constrained search 

Before dealing with the problem of parameterized parts, we briefly review the recog¬ 
nition method used [Grimson and Lozano-Perez 84, 87]. 


2.1 Definition of a solution 

Suppose we are given a set of data fragments, obtained from the boundary of an 
object or objects, and measured in a coordinate system centered about the sensor. 
Suppose we are also given a set of object models, specified by a set of faces (whose 
definition we will make formal shortly) measured in a local coordinate frame specific 
to the model. A solution to the recognition problem consists of a three-tuple 

(obj ect,-, {(d^, m h ), (d* 2 ,m h ),... (d ik , m jk )}, (R, v 0 )) 

where obj ectj identifies which object from a library of known objects, the d, m 
pairings are associations of a subset of the sensory data d with model faces m from 
obj ect, and R is a rotation matrix, and Vo is a translation vector such that a vector 
v m in model coordinates is transformed into a vector in sensor coordinates by 

v rf = Rv m + v 0 
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and where this coordinate frame transformation maps the model from its local co¬ 
ordinate frame into the sensor coordinate frame in such a manner that each data 
fragment correctly lies on its assigned model face. 

As has been described elsewhere [Grimson and Lozano-Perez 84, 87], we ap¬ 
proach the recognition problem as one of search. Thus, we first focus on finding 
legitimate pairings of data and model fragments, for some subset of the sensory 
data. We chose to structure this search process as a constrained depth first search, 
using an interpretation tree ( IT). Each node of the tree describes a partial interpre¬ 
tation of the data, and implicitly contains a set of pairings of data fragments and 
model faces. Nodes at the first level of the tree define assignments for the first data 
fragment, nodes at the second level define assignments for the first and second data 
fragments, and so on. Each node branches at the next level in up to n + 1 ways, 
where n is the number of model faces in the object. The last branch is a wild card 
or null branch and has the effect of excluding the data fragment corresponding to 
the current level of the tree from part of the interpretation. 

Given s data fragments, any leaf of the tree specifies an interpretation 
{(di, m h ), (d 2 ,m h ),...(d s ,m ja )}, 

where some of the may be the wild card character. By excluding such matches, 
the leaf yields a partial interpretation 

{ (di i, m j(i ), (d i2 , m ji2 ),... (d ik , m jik )} 

where 1 < i\ < i 2 < ■ •. < ik but these indices may not include the entire set from 1 
to s. This interpretation may then be used to solve for a rigid, scaled transformation 
that maps model faces into corresponding data fragments, if such a transformation 
exists. Thus, by searching for leaves of the tree and testing that the interpretation 
there yields a legal transformation, we can find possible instances of object models 
in the data. 

Since this search process is inherently an exponential problem, the key to an 
efficient solution is to use constraints to remove large subtrees from consideration 
without having explicitly to explore them. In [Grimson and Lozano-Perez 84, 87] 
we describe a set of constraints based on the local shape of parts of objects, either 
in two dimensions or in three. In this work, the object models and the sensory data 
consist of linear edge or face fragments. The constraints include the following: 

• The length of a data fragment must be smaller than the length of a correspond¬ 
ing model fragment, up to some bounded measurement error; 

• The angle between the normals to a pair of data fragments must differ from the 
angle between the normals of the corresponding model fragments by no more 
than a bounded measurement error; 

• The range of distances between two data fragments must lie within the range 
of distances of the corresponding model fragments, where the model range has 
been expanded to account for measurement errors; 

• The range of components of a vector spanning the two data fragments in the di¬ 
rection of each of the fragment’s normal must lie within the corresponding range 
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of components for vectors spanning the model fragments, modulo measurement 
error. 

In [Grimson 87] we extended these constraints to include curved objects in two 
dimensions. 


2.2 The constraints reduce the search 

Given these unary and binary constraints, the constrained search process can be 
straightforwardly specified. Suppose the search process is currently at some node at 
level k in the interpretation tree and with a consistent partial interpretation given 

by 

{(<*1, m h), (d 2 ,m h ),... (<f fc , m jk )} . 

We now consider the next data fragment 4+ 1 , and its possible assignment to model 
face rrij k+1 , where jk+i varies from 1 to n + 1. 

The following rules hold. 

• If mj k+1 is the wild card match, then the new interpretation 

{(di,m jl ),(d, 2 ,m j2 ),... (dk+i,m jk+1 )} 
is consistent, and we continue downward in our search. 

• If rrtj k+1 is a real model edge segment, we must verify that the length constraint 

holds for matching 4+1 to m JJb+1 , and that the angle, distance and component 
constraints hold for the pairings (4+i> f° r 1 < i < k. 

• If all of these constraints are true, then 

{(4 , m h ), (d 2 , m h ) ,... (d k+1 , m ji+1 )} 

is a consistent partial interpretation, and we continue our depth first search. If 
one of them is false, then the partial interpretation is inconsistent. In this case, 
we increment the model face index j k +1 by 1 and try again, until j k +\ = n + 1. 

If the search process is currently at some node at level k in the interpretation tree, 
and has an inconsistent partial interpretation given by 

{(<4, TOjj), (d 2 ,m h ),... (4, m jk )} 

then it is in the process of backtracking. If j k = n + 1 (the wild card) we backtrack 
up another level, otherwise we increment j k and continue. 


2.3 Model tests 

Once the search process reaches a leaf of the interpretation tree, we have accounted 
for all of the data points. We are now ready to determine if the interpretation is in 
fact globally valid. To do this, we solve for a rigid transformation mapping points 
v m in model coordinates into points in sensor coordinates, 

Vd = Rv m + v 0 
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where R is a rotation matrix, and Vo is a translation vector. We can solve for this 
transformation in a number of ways [e.g. Grimson and Lozano-Perez 84, 87, Ayache 
and Faugeras 86]. 

Given such a transformation, which is usually found using some type of least 
squares fit, we must then ensure that the interpretation actually satisfies it. We do 
this by considering each of the data fragments associated with a real model frag¬ 
ment in the interpretation, and transforming the associated model fragment by the 
computed transform. For each such fragment, we then verify that the transformed 
fragment differs in position and orientation from its associated data fragment by 
amounts that are less than some acceptable error bounds. These bounds on trans¬ 
form error can be obtained from the predefined bounds on the sensor error [Grimson 
86b]. Any interpretation that passes such a model test is a consistent interpretation 
of the data. 


2.4 Additional search reductions 

While the constrained search technique described above will succeed in finding all 
consistent interpretations of the sensory data, for a given object model, it is not par¬ 
ticularly computationally efficient. This is mostly due to the problem of segmenting 
the data to determine subsets that belong to a single object. Indeed, if all of the 
sensory data do belong to one object, the described method is known to be quite 
efficient, as has been verified both empirically [Grimson and Lozano-Perez 84, 87] 
and theoretically [Grimson 1986a]. In order to improve the efficiency of the method, 
we can add two additional methods to our search process, both previously discussed 
for the case of linear fragments in [Grimson and Lozano-Perez 87], and extended to 
circular segments in [Grimson 87]. 

The first is to use a parameter hashing scheme, such as a Hough transform, 
to hypothesize small subspaces of the entire search space that are likely to contain 
an interpretation. The second is to use a measure of matching, such as the portion 
of the object perimeter correctly accounted for by the matched sensory data, to 
prematurely terminate the search process. In the work described here, we use only 
the second heuristic. 


3. Parameterized Families 

3.1 Examples of Parameterized Objects 

While our previous work has illustrated the utility of our approach to the prob¬ 
lem of rigid objects, we are interested here in extending the method to deal with 
parameterized objects. We consider a number of different possibilities. 
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Scale 

Perhaps the simplest example of a parameterized family is that defined by a rigid 
object that can undertake a range of possible sizes, that is, the shape of the object is 
fixed, but the overall scale factor can vary. Many techniques for object recognition 
and localization can easily deal with this case, since the scale factor can simply be 
considered part of the coordinate frame transformation required to map the model 
patches into their corresponding sensed patches. 


Coordinate-frame transformations 

A more interesting class of parameterized objects are those that involve a limited 
number of moving parts. A good example is a pair of scissors, which has a single 
degree of freedom, namely the rotation of the two blades relative to a common 
joint. We would like to be able to recognize the scissors, independent of the relative 
orientation of the blades, and without requiring a different model to represent each 
orientation. This class could further be extended to include scissors of different 
sizes. 


Stretching deformations 

A third class of parameterized objects are those in which subparts can stretch along 
an axis. An example would be a family of hammers, for which there is a generic 
handle shape, but which can stretch along the axis of the handle, as indicated in 
Figure 1. 


X 

\ 





Figure 1. A set of parameterized subparts, in which the generic shape in the upper left is 
stretched along the axis of the shape. 

Our goal is to extend our recognition method to handle such classes of parameterized 
families. 
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In dealing with parameterized families, we restrict our attention to two dimen¬ 
sional objects, that are composed of sets of linear edge fragments. Each linear edge 
fragment consists of two endpoints, and a unit vector normal to the line between 
them and pointing away from the interior of the object. Formally, this is given by 

linear^ = (fij, (bj,e*)). 

Note that a point on the edge can be represented by 

&i and bj + a,t,-, ctj € [0,f;] 

where m is the unit normal vector, tj is a unit tangent vector, oriented so that it 
points from b to e, and a* can vary from 0 to the length of the edge £{. 


3.2 Possible approaches 

A large number of methods have been explored in the literature for recognizing rigid 
objects, both in two dimensions and in three. When considering parameterized 
objects, far fewer methods have been considered. In particular, while a number 
of schemes have been suggested for representing parameterized objects, such as 
generalized cylinders and superquadrics, at this point very few actual recognition 
engines based on such parameterized representations have been demonstrated. The 
best known such system is probably ACRONYM [Brooks 1981]. Within the context 
of our approach to recognition, there are two distinct alternatives for extending 
the method to handle parameterized parts, both related in a global sense to the 
approach taken by Brooks. 

The first approach is to extend our geometric constraints to directly incorporate 
the free parameters. In this case, the search process would become a constraint 
propagation technique, in which the current range of possible values for each of the 
parameters would be passed from a parent node of the IT to each of its sons. At each 
new node, the constraints imposed by matching the new data patch to its assigned 
model patch would be used to refine the range of free parameters, which would then 
be passed to that node’s children. If any parameter is reduced to an empty range of 
values, the interpretation is inconsistent and the search along that subtree can be 
ter m inated. 

The main difficulty with this approach is finding a clean way of representing 
the parameterized constraints, especially in a manner that will easily allow the 
computing and updating of feasible ranges for each of the parameters. Consider our 
example of a pair of scissors, where the parameter to be determined is the angle 
between the two blades. If two data fragments are being considered as belonging to 
two model fragments that are part of the same rigid subpart, then the constraints 
are the same as in our earlier approach. They either indicate consistency, in which 
case the range of possible values for the rotation parameter remains the same as it 
was before considering this pairing, or they indicate inconsistency, in which case the 
search must backtrack. On the other hand, suppose two data fragments are being 
considered as belonging to model fragments on different rigid subparts. In this case, 
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we need a means of expressing the range of possible values for the rotation parameter 
as an explicit function of the relative geometry of the two model fragments and the 
two data fragments. This may prove difficult to obtain. 

The second approach is to break the object model into rigid subparts, all of 
which are connected to a global model-based coordinate frame through a series of co¬ 
ordinate frame transformations. Each sub part can then be recognized by application 
of our earlier technique, including a free scale parameter. Once the subparts have 
been recognized and located, we must check that they are consistent by confirming 
that the parts satisfy a set of predetermined global coordinate-frame constraints. 

Consider the earlier example of a pair of scissors, with a free overall scale factor. 
In this approach, we treat each blade of the scissors as a rigid subpart. Thus, we 
attempt to locate instances of the right and left blade in the sensory data. Once we 
have done this, we then confirm that the subparts are parts of a consistent whole. 
In the case of the scissors, this would involve checking two things: (1) the scale 
factor associated with each blade is roughly the same, and (2) the transformations 
from model coordinates to sensory coordinates associated with each blade are such 
that the position, in sensor coordinates, of the pin joining the two blades is roughly 
the same (i.e. the located instances of the blades in the data are rotated about the 
expected common axis). The advantage of this second method is that the geometric 
constraints remain simple, yet combinatorially powerful. 

Note that we can apply our search for rigid subparts in several ways. The 
simplest is to search the data independently for each rigid subpart, then test all 
possible combinations of subparts for consistent wholes. A more efficient method 
would be to first search the data for one subpart (e.g. the largest). For each 
candidate solution found in the data, we can then use limits on the ranges of the 
parameters to restrict the possible positions of the other subparts in the sensory 
data. Using this reduced data set, we can then search for instances of the other 
subparts, testing each instance for global consisistency. If no instance of the initial 
seed subpart is found, (for example, it is occluded in the data) we can then consider 
the next seed subpart (e.g. the next largest) and proceed as before. 

In this paper, we explore both options. We first derive the set of geometric 
constraints on interpretations, and then illustrate the search process on some simple 
examples. 


3.3 Scale Factors 

Perhaps the simplest family of objects to consider are those in which a single, rigid 
object of known shape can undergo an arbitrary global scaling, within some limits. 
We need to consider how to adjust the recognition process, so that it can not only 
recognize where an object is in the data, but also its overall size. 

We assume that the scale factor is applied to the data, so that the transfor¬ 
mation from a point in model coordinates, v m , to sensor coordinates, v<*, is given 
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by 


svj. = Re + v 0 


where s is a scale factor, 9 is an angle, R$ is a rotation matrix of angle 9 and vo is 
a translation vector. 


3.3.1 Length constraint 


If we are matching data edge di, given by 

(hj,(bj,ej)) 

to model edge m p , given by 

(n p , (Bp, E p )j 

then the length of the data edge must be less than the length of the corresponding 
model edge, modulo measurement error. We let denote the length of the data 
fragment, and L p denote the corresponding length of the model fragment, where 
these lengths are given by 

= |b* — Gj |, L p = |Bp Ep|. 

Then we must have 


sii ^ L p T €/ Vs 

where €l is a predefined upper bound on the amount of error inherent in measuring 
the length of an edge. 

We can define 


scaled-length-constraint(i, p) 


0 , 


Lp + cl 


that is, the range of scales consistent with this assignment. This constraint returns 
a (possibly empty) range of values. 


3.3.2 Angle constraint 

Let 9ij denote the angle between n,- and hj, and let 0 P9 denote the angle between 
N p and N g . We let 

binary-angle-constraint(«, j, p, q) = True iff 9ij € [0 P9 — 2e a ,@p 9 + 2e a ] 
where all arithmetic comparisons are performed modulo 2x and where e a is an upper 
bound on the amount of error inherent in determing the direction of a normal. 


3.3.3 Component constraint 

The third constraint concerns the separation of the two edge fragments. In particu¬ 
lar, we consider the range of components of a vector between the two edge fragments, 
in the direction of each of the edge normals. Algebraically, this is expressed by the 
dot product 


(bj + o,ti - b_/ - ajtj, hi) 
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which reduces to 

(bj — bj, nj) — otj (t j, n,) oij G [0,^j] 

Of course, there is an equivalent constraint for components in the direction of nj. 
Note that this expression actually determines a range of values, with extrema when 
a.j = 0, lj. We denote this by 

dtij = min{(bj - bj, ft*) - aj (tj, n») \aj G {0,^}} 
d^j = min{(bj - bj, n*) - aj (tj, A*) |<*j G {0,^}} 

These ranges can be computed both for pairs of data edges and pairs of model 
edges. In the ideal case, consistency will hold only if the data range is contained 
within the model range (since the data edges may correspond to only parts of the 
model edges). As in the case of the other constraints, we also need to account for 
error in the measurements. We derive a simple method for doing this below. 

Consider the base case, shown in Figure 2a. The perpendicular distance from 
the endpoint of one edge to the other edge is shown as D^. In Figure 2b, the 
edge is rotated by e a about its midpoint, and the new perpendicular distance X is 
shown. We need to relate X to measurable values. We already have D 1 . We can also 
measure S, the distance from the midpoint of the edge to the perpendicular dropped 
from the endpoint of the other edge, as shown. Straightforward trigonometry then 
yields the new distance 

X = (D x — S sin e a ) cos c a . 

Since the position of the second edge is not known exactly, we must adjust this 
expression, to yield one limit on the range of possible measurements: 

Dj- pq = (D l - S sin e 0 ) cos e a - e p . 

The other extreme is shown in Figure 2c. Trigonometric manipulation yields 
the following upper bound 

D h,pq ~ {S ~ sine 0 ) sine a + D L sece a + e p . 

Thus, given two model edges indexed by p,q, we can compute a range of possible 
measurements (modulo known error bounds), by using Dj- pq and D^ pq computed 
over all the endpoints of the edges. We denote this range by [M^ pq , M^ pq \. 

Given a range of projections of data edge i onto edge j, and a corresponding 
range of projections for model edge p onto model edge q, we need to determine 
bounds on s such that 


[* d tij, 3d h,ij] ^ [ M tpq' M h,pq\- 


The following cases hold: 

If (t i, nj) > 0 then 

If ( bj - bj, fij) > 0 


then 


$h, < 



If ( bj - bj, nj) < 0 
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Figure 2. Errors in computing the direction constraint, (a) The component of a vector 
from one endpoint in the direction of the other edge’s normal is given by the perpendicular 
distance d to the extended edge, (b) Since the actual normal is only accurate to within 
e a , one extreme case is given by rotating the extended edge about its midpoint by that 
amount and finding the new perpendicular distance, (c) The other extreme is obtained by 
considering the other endpoint. 


If (b j - by, nj) - £f(ti, hj) > 0 

then S£ > - M( ' r ' 1 - jz. -r- 

( by - b ( , by) - ^ti, by J 

If ( bj - by, iij) - hj) < 0 

then Sh < - Mt ' P9 - jz -r- 

( by - b„ ny) - hj) 

If (ty, nj) < 0 then 

If ( bj - by, nj) > 0 

then ^ ^ ( C- £>', ny) 

If ( bj — by, nj) < 0 
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If 


If 


then 

<bj 

then 

<*>,• 

then 


Mf 


*‘ s :b, - g„,, 7 

b», nj) - U (t i, hj) > 0 
. / _ Mt „ _ 

$h _ /a \ 

( b, - b(, n,) - tijj 

bj, nj) - £{ (t u hj) < 0 

. ^ vq 

se > -^- Ea - -rz -r- 

{ b, - b i( ftj) - ^t it n, j 


Thus, based on the measured and model constraint ranges, we can compute a range 
of scale factors [-«<,», s h,i,j,p,q] f° r which the assignment of data edges to model 
edges is consistent. We let 

scaled-component-constraint(i, < ;',p, q ) = [se,i,j,p,q, s h,i,j,p,q]- 
Although we will not use it here, note that we could derive a similar form for the 
distance constraint. 

Given these unary and binary constraints, we can now modify our constrained 
search process. With each node of the search tree, we associate a range of consistent 
values for the scale parameter, which we will denote where k indicates 

the level of the node in the tree. Suppose the search process is currently at some 
node at level k in the interpretation tree and with a consistent partial interpretation 
given by 


{(di, m h ), (d 2 , m h ),... {d k , m jk )}. 

We now consider the next data fragment d k + 1 , and its possible assignment to model 
fragment nrij k+1 , where j k +\ varies from 1 to n + 1. 

The following rules hold. 


If mj k+1 is the wild card match, then the new interpretation 
{(di, rrij k ), (e? 2 > m j 2 )i • • • {dk+ 1 > m j*+i)} 
is consistent, and we continue downward in our search, setting 

l4‘ +1| .4‘ +1) ] = [4 t, .4 i) ]- 

If rrij k+1 is a linear edge segment, we let 

[s^ fc+1 \s^ fc+1 ^] = pJscaled-length-constraint(A: + ljjfc+i)* 

If this new range is non-empty, then for all i € {1,..., k} such that di is a linear 
edge fragment, we verify that 

binary- angle-constraint(i, k + l,ji,j k +i) = True 

and we set 

„(*+i) „(*+i)i _r„(*+i) ,(*+i)i 


[4 + V/T '1 


• If 


P| scaled-component-constraint(i,fc -f l,ji,j k +i) 
f'l scaled-component-constraint(& + 1, *, j k + 1 , ji). 

[* ( r\ 4 fc+i) ] 
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is non-empty, then 

{(dx, ),( d,2 , • • • (^fc+i i m jk+i )} 

is a consistent partial interpretation, and we continue our depth first search. 
Otherwise, the partial interpretation is inconsistent. In this case, we increment 
the model fragment index jk+i by 1 and try again, until jk+i = n + 1. 

If the search process is currently at some node at level k in the interpretation tree, 
and has an inconsistent partial interpretation given by 

{(di, m h ), (d 2 , m h ),... (d*, m jk )} 

then it is in the process of backtracking. If jk = n + 1 (the wild card) we backtrack 
up another level, otherwise we increment jk and continue. 

In this manner, we can naturally extend our constrained search method to 
recognize objects from families in which the free parameter is overall scale. An 
example is shown in Figure 3. 


3.4 Rotating Subparts 

More interesting classes of parameterized families include those in which parts of the 
object are allowed to move with respect to one another. A good example of such a 
family is a pair of scissors. A fixed size pair of scissors has a single degree of freedom, 
namely the rotation of the two blades relative to a common joint. We would like 
to be able to recognize the scissors, independent of the relative orientation of the 
blades, and without requiring a different model to represent each orientation. This 
class could further be extended to include scissors of different sizes. 

As we suggested earlier, this could be done by generalizing the constraints to 
directly take the free parameters into account. However, an easier approach is to 
break the object up into rigid subparts, and deal with each separately. We illustrate 
this with our scissors example. 

Suppose we treat each blade assembly as a separate part. We choose the location 
of the common pin as the origin of the model coordinate frame. Now suppose that we 
run our recognition system on each part, solving for a transformation o,z, 

for the left blade and for a transformation 0r,sr,\o,r for the right blade. This 
can proceed in a manner identical to that described previously. To ensure that 
the two subparts are actually part of a common whole, we need to test that their 
interpretations are globally consistent. This can be done by means of a simple set of 
geometric constraints on their respective transformations. In this case, we require 

SL ~ SR 
Vo,L ~ V 0 ,.R 

Note that 0l and Or could in principle take on any values. In practice, there is a 
limited range of orientations that the scissors can take on, so that a third constraint 
would be 


II 0 l -0r\\<C 
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Figure 3. Examples of recognition when the free parameter is overall scale. The first part 
shows a set of linear edges segments, the second shows the overlay of the located object, 
and the third shows the located object in isolation. 

where C is some threshold on the range of rotations, and the arithmetic is done 
modulo 2tt. An example is shown in Figure 4. 

Note that the search can be done independently for each part, followed by 
the application of the global constraints on each candidate pair of subparts. More 
effectively, we can first solve for the location of one of the subparts, and then use 
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Figure 4. Examples of recognition when the free parameters are overall scale and rotation 
about a common axis. The first part shows a set of linear edges segments, the second shows 
the overlay of the located object, and the third shows the located object in isolation. 


that position to restrict the possible positions of the second part, thereby directly 
removing some portions of the sensory data from consideration. We can also use 
the solution for the first subpart to restrict the values of the free parameters, for 
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example, limiting the range of acceptable scale factors before beginning the search 
for the second subpart. 

We also need to add another level of backtracking search to our process. In 
particular, suppose we have found a candidate for the first rigid subpart, but we 
cannot find an acceptable candidate for the second one. In this case, we must 
backtrack to the point in the search for the first subpart at which we found the first 
candidate, and continue that search. If a second candidate for the first subpart can 
be found, then we can initialize a new search for the second subpart, and so on. 

In this case, the data structures used to represent an object become somewhat 
more complex than in the case of rigid objects. Here, an object representation must 
include: a list of rigid subparts, each of which is represented by a set of constraint 
tables as in the original recognition method; a list of the free parameters; a set of 
procedures for verifying the post constraints; and a procedure for generating the 
restricted search area for a part, as a function of the pose of solution for other parts. 


3.5 Subparts that Stretch 

As a third example, consider a family of tools, say a set of hammers with identical 
heads, but different handles. Again, we would like to extend our method to recognize 
both the identity and location of the hammer, and to determine which handle is 
attached. To model the handles, we assume that a generic shape (such as that shown 
in the left of figure 1) can stretch by some variable amount along an axis (in the case 
of the handle in figure 1 this is the axis of symmetry). The problem is to extend 
the search method to deal with constraints that are themselves parameterized. We 
do this as follows. 

Without loss of generality, we assume that the model part has been oriented so 
that the axis of stretching is the a:-axis in model coordinates. We let a denote the 
amount of stretching along that axis, with a = 1 designating the base case. Note 
that a is likely to be restricted to some range of values, which may be specified 
beforehand. 

Consider first the constraints on the surface normals. In the case of rigid models, 
our constraint was that the angle between two data normals must be the same as 
the angle between the corresponding model normals, to within some error. In the 
case of stretching parts, the normals will vary relative to one another as a function 
of the stretching parameter a. Suppose we let denote the measured angular 
difference, we let e denote the allowed error range in measuring the angles, and we 
let denote the corresponding model angles, in model coordinates, for the base 

case a = 1. By appropriate algebraic manipulation, the following cases hold. 

• € {0,7T, y, — y}. In this case, we need only check that Oij G [<& p — — 

€,$p~ $q + «]• 

• € (0,7T}. ^ {0,7T, y, -f). In this case, the stretching factor is given by 

— tan 6ij 

tan 


a = 
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A similar case holds when the roles of i and j are reversed. 

• {f> — §}• ^9 £ {0,7r, §,-§•}• In this case, the stretching factor is given 

by 

l 

a = ---—. 

tan tan 

A similar case holds when the roles of i and j are reversed. 

• tan$ p ^ 0,tan$ g ^ O,tan0jj = 0 In this case, a = 0 which indicates an 
inconsistency. 

• All other cases. The stretching factor is given by 

tan — tan$ 0 

(X — --- 2 - 

2tan tan tan Oij 

Note that the measurement Oij is actually a range of measurements, due to error in 
sensory data. Thus, by applying the above computation over a sampling of values 
for 6ij within this range, we can obtain a range of consistent values for the stretch 
factor, which we represent by 

and we define 

stretch-angle-constraint(i, j,p, q ) = [a t ^ j iP<g , ah,i,j, P , q ]- 

For the component constraint, we can perform a similar analysis. Suppose we 
are given two non-parallel data edges, each of which is designated by a base point 
bj and an end point e». These are chosen so that the tangent vector pointing from 
the base point to the end point is 90° clockwise from the normal vector ii; to the 
edge. For these two edges, we can compute the component of the vector bj — bj in 
the direction of the normal vector fij, which we call 

d tij = (bj -bj,n t ) 

and the component 

d tij = < e i “ b «> M • 

Then given a corresponding pair of model edges, we can compute similar components 

Mt„ = (B, - Bp,N p ) 

and 

Mk tVq = ( E,-B p ,N p ). 

We also let 

a = signum j ^(e< - b*)- 1 -, (e,- - b,-)^ j 

and we let Ax{ and Ay* denote the x and y components respectively of the vector 
e,- — bj. Then the range of values of the stretch factor a is given by the range 
spanned by 



a = 
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_ vdjij |Ay<| _ 

\/(K, r ,f -(Km) 2 (Ax,') j ' 

In fact, one must also allow for error in the measurements, which will yield a 
range of values for dj-- and d^-, leading to a larger range of values for the stretching 
factor a, again denoted 

[ a e,i,j,p,qi a h,i,j,p,q]- 

We define 

stretch-component-constraint(i, j, p, q ) = Oih,i,j , p , q ]. 

Finally, we can also alter the length constraint, which in this case is given by 

_^ _ I". I( L P ~ €z -) 2 ~ (Az») 2 " 1 


stretch-length-constraint( 


. N r k l p - € l ) 2 - (& x i) 2 
(, ' p) = Iv -(Sio 3 -’ “J ' 


Given these unary and binary constraints, we can now modify our constrained 
search process. With each node of the search tree, we associate a range of consistent 
values for the stretch parameter, which we will denote [a^,a^], where k indicates 
the level of the node in the tree. Suppose the search process is currently at some 
node at level k in the interpretation tree and with a consistent partial interpretation 
given by 

{(di, m h ),(d 2 , m k ), ...{d k , m jk )}. 

We now consider the next data fragment d k +\ , and its possible assignment to model 
fragment mj k+1 , where j k + 1 varies from 1 to n + 1. 

The following rules hold. 

• If rrij k+1 is the wild card match, then the new interpretation 

{ (di , m h ),(d 2 ,m h ),... (d k+1 , m jk+1 )} 
is consistent, and we continue downward in our search, setting 

[o«‘ + ’>,4‘ +1 >] = 

• If nrij k+ , is a linear edge segment, we let 

[a^ fc+1 \a^ fc+1 ^] = [a^,aj^] p| stretch-length-constraint(A; + l,jfc+i). 

If this new range is non-empty, then for all j G {1,..., k} such that d( is a linear 
edge fragment, we let 

[4‘ +,) .«r i) i=[ar i, .«s.‘ +,, i 

Pjstretch-component-constraint(i,fc + l,ji,j k +i) 

Pj stretch-component-constraint(& + l,i, j k + 1 , ji) 

P| binary-angle-constraint(j,& + l,ji,jk+i)- 


[«5‘ +1) ,4‘ +,) ] 
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is non-empty, then 

{(d x ,m h ),(d 2 ,m h ),... (cfc+i, m A+1 )} 

is a consistent partial interpretation, and we continue our depth first search. 
Otherwise, the partial interpretation is inconsistent. In this case, we increment 
the model fragment index jk+i by 1 and try again, until jk+i = n + 1. 

If the search process is currently at some node at level k in the interpretation tree, 
and has an inconsistent partial interpretation given by 

{(di, m h ), (d 2 ,m h ), ...(<**, m jk )} 

then it is in the process of backtracking. If jk = n + 1 (the wild card) we backtrack 
up another level, otherwise we increment jk and continue. 

Figure 5 shows an example of a set of overlapping handles (taken from the 
family illustrated in Figure 1). Each instance of one of the handles is identified and 
located, including determining the actual value of the stretching parameter. 


3.6 Combining Parameterizations 

It is useful to be able to recognize objects that combine different types of param¬ 
eterizations. For example, consider a pair of shears, that have both a rotational 
freedom between the two blades, and a stretching freedom along the axis of each 
blade. We can combine the methods described in Sections 3.3 and 3.4 to deal with 
this more general problem. An example is shown in Figures 6-9. Here, the system 
correctly solves for the position and orientation of the object, the angle of rotation 
between the blades and the stretching factor of the blades. 

One could also combine stretching and scaling parameters in a single family 
of objects. This is equivalent to allowing independent stretching in two orthogonal 
directions. In this case, there are two parameters for which to solve, so that each 
constraint only specifies a relationship between the parameters. We can define a 
two-dimensional parameter space, spanned by the stretching parameter in the x 
and y directions. Initially, this space will contain a region of feasibility, defined by 
any limits on the range of parameters. As we add each constraint in our search 
process, a new region of the space will be defined, and the intersection of the two 
will determine the range of feasible parameter values consistent with the current 
interpretation. As in the earlier cases, if the region of feasibility becomes empty, the 
interpretation is inconsistent. 

Determining the region of feasibility defined by the constraints is somewhat 
delicate. The length constraint, for example, yields an ellipse centered at the ori¬ 
gin of the parameter space, whose complement demarks the feasible region. The 
angle constraint yields a feasible region that consists of a contiguous family of rays 
passing through the origin of the parameter space. In principle, one could use such 
constraints, together with procedures for intersecting regions in the plane to imple¬ 
ment a recognition system for parts that stretch, scale and rotate at the same time. 
We have not yet done so. 
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Figure 5. Examples of recognition when the free parameter is stretching along an axis. 
The first part shows a set of linear edges segments, the second shows the overlay of the 
located objects, and the third shows the located objects in isolation. 


One drawback of the system presented here is that the analysis of how to pa¬ 
rameterize an object model is done by hand, rather than automatically determining 
the parameterization for a model [Brooks 81]. 



Figure 6. Example of recognition with different types of parameterizations. The object has 
a rotational free parameter and a stretching free parameter. The figure shows the original 
image. 

4. Relation to previous work 

The literature on object recognition stretches over a period of at least twenty years. 
An extensive (70 page) review of much of this literature for 3D objects can be found 
in [Besl and Jain 1985]. A survey of model-based image analysis systems can be 
found in [Binford 1982]. 

In terms of the approach to be described here, a number of authors have taken 
a similar view to ours that recognition can be structured as an explicit search for a 
match between data elements and model elements [Ayache and Faugeras 86, Baird 
85, Bolles and Cain 82, Bolles, Horaud and Hannah 83, Faugeras and Hebert 83, 
Goad 83, Ikeuchi 87, Lowe 86, Stockman and Esteva 84]. Of these, the work of 
Bolles and his colleagues, Faugeras and his colleagues, and that of Baird are closest 
to the approach presented here. 

The interpretation tree approach is an instance of the consistent labeling prob¬ 
lem that has been studied extensively in computer vision and artificial intelligence 
[Waltz 75, Montanari 74, Mackworth 77, Freuder 78, 82, Haralick and Shapiro 79, 
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Figure 7. Example of recognition with different types of parameterizations. The edge 
fragments extracted from the image in Figure 6 are shown. 


Haralick and Elliott 80, Mackworth and Freuder 85]. This paper can be viewed as 
suggesting a particular consistency relation (the constraints on distances and angles) 
and exploring its performance in a wide variety of circumstances. An alternative 
approach to the solution of consistent labeling problems is the use of relaxation. 
A number of authors have investigated this approach to object recognition [Davis 
79, Bhanu and Faugeras 84, Ayache and Faugeras 82]. These techniques are more 
suitable for implementation on parallel machines. 

The literature on recognition of parameterized objects is much smaller. The 
best known system is probably ACRONYM [Brooks 81], which also attacks the 
recognition problem by means of constraints to reduce ranges of parameterized vari¬ 
ables. One of the main differences is that Brooks’ system dealt with both rigid 
subparts and constraints that incorporated free parameters at runtime. In the ap¬ 
proach presented here, we are compiling special cases of parameterization by hand 
in advance, so that the runtime portion of the problem is much simpler, and uses 
stronger constraints. This makes our system somewhat less general than Brooks’, 
although it does benefit from a simpler recognition engine. 
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Figure 8. Example of recognition with different types of parameterizations. The solution 
is overlaid on the edge fragments extracted from the image in Figure 6. 
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